Crowdworker Filtering with Support Vector Machine

نویسندگان

  • Hohyon Ryu
  • Matthew Lease
چکیده

Crowdsourcing has been recognized as a possible technique to complement costly user studies, usability studies, relevance judgment for information retrieval studies, and training set build-up for automatic document classification. However, the quality of crowdworkers varies by diverse factors and we often cannot tell whether their answers are right or wrong immediately due to the lack of gold standard answers. In this paper, we present a machine-learning based crowdworker filtering technique that can be used to assess workers immediately after they finish their assigned tasks. A Support Vector Machine (SVM)-based crowdworker filter, called a Smart Crowd Filter (SCFilter), was used to predict the probability that each label is correct and identifies those crowdworkers that consistently provide answers that are unlikely to be correct. To verify the performance of the SCFilter, a bad worker detection simulation test and an experiment in an actual crowdsourcing environment at the Amazon Mechanical Turk (AMT) website were performed. In the simulation test, bad worker detection performance was assessed in terms of precision and recall. In the experiment at the AMT website, a statistically significant improvement was observed for automatic document classification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets

Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...

متن کامل

A Wavelet Support Vector Machine Combination Model for Daily Suspended Sediment Forecasting

Abstract In this study, wavelet support vector machine (WSWM) model is proposed for daily suspended sediment (SS) prediction. The WSVM model is achieved by combination of two methods; discrete wavelet analysis and support vector machine (SVM). The developed model was compared with single SVM. Daily discharge (Q) and SS data from Yadkin River at Yadkin College, NC station in the USA were used. I...

متن کامل

Application of Genetic Algorithm Based Support Vector Machine Model in Second Virial Coefficient Prediction of Pure Compounds

In this work, a Genetic Algorithm boosted Least Square Support Vector Machine model by a set of linear equations instead of a quadratic program, which is improved version of Support Vector Machine model, was used for estimation of 98 pure compounds second virial coefficient. Compounds were classified to the different groups. Finest parameters were obtained by Genetic Algorithm method ...

متن کامل

Identification areas with inundation potential for urban runoff harvesting using the support vector machine model

     Rainfall-runoff from urban areas is one of the available water resources, which is wasted due to lack of attention and proper management. Besides, urban runoff excess of drains capacity causing many problems including inundation and urban environmental pollution. Therefore, harvesting this runoff can provide a part of the required water in urban areas, and also reduce flood and urban inund...

متن کامل

Fault diagnosis in a distillation column using a support vector machine based classifier

Fault diagnosis has always been an essential aspect of control system design. This is necessary due to the growing demand for increased performance and safety of industrial systems is discussed. Support vector machine classifier is a new technique based on statistical learning theory and is designed to reduce structural bias. Support vector machine classification in many applications in v...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011